Experience in Extending Query Engine for Continuous Analytics

نویسندگان

  • Qiming Chen
  • Meichun Hsu
چکیده

 Experience in Extending Query Engine for Continuous Analytics Qiming Chen, Meichun Hsu HP Laboratories HPL-2010-44 In-Database Stream Processing Combining data warehousing and stream processing technologies has great potential in offering low-latency data-intensive analytics. Unfortunately, such convergence has not been properly addressed so far. The current generation of stream processing systems is in general built separately from the data warehouse and query engine, which can cause significant overhead in data access and data movement, and is not able to take advantage of the functionalities already offered by the existing data warehouse systems. In this work we tackle some hard problems not properly addressed previously in integrating stream analytics capability into the existing query engine. We define an extended SQL query model that unifies queries over both static relations and dynamic streaming data, and develop techniques to generalize query engines to support the unified model. We propose the cut-and rewind query execution model to allow a query to be applied to stream data by converting the latter into a sequence of "chunks", and executing the query over each chunk sequentially without shutting the query instance down between chunks;, we also propose the cycle-based transaction model to support Continuous Querying with Continuous Persisting (CQCP) with cycle-based isolation and visibility. We have prototyped our approach by extending the PostgreSQL. This work has resulted in a new kind of tightly integrated, highly efficient system with the advanced stream processing capability as well as the full DBMS functionality. We demonstrate the system with the popular Linear Road benchmark, and report the performance. By leveraging the more mature codebase of a query engine to the maximal extent, we can significantly reduce the engineering investment needed for developing the streaming technology. Providing this capability on HP SeaQuest parallel analytics engine is work in progress. External Posting Date: May 21, 2010 [Fulltext] Approved for External Publication Internal Posting Date: May 21, 2010 [Fulltext] Copyright 2010 Hewlett-Packard Development Company, L.P. Experience in Extending Query Engine for Continuous Analytics Qiming Chen Meichun Hsu HP Labs HP Labs Palo Alto, California, USA Palo Alto, California, USA Hewlett Packard Co. Hewlett Packard Co. [email protected] [email protected]

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A user study of web search session behaviour using eye tracking data

In this paper we present and empirically evaluate a user study using a web search log and eye tracking to measure user behaviour during a query session, that is, a sequence of user queries, results page views and content page views, in order to find a specific piece of information. We evaluate different tasks, in terms of those who found the correct information, and in terms of the query sessio...

متن کامل

Improving Europeana Search Experience Using Query Logs

Europeana is a long-term project funded by the European Commission with the goal of making Europe’s cultural and scientific heritage accessible to the public. Since 2008, about 1500 institutions have contributed to Europeana, enabling people to explore the digital resources of Europe’s museums, libraries and archives. The huge amount of collected multi-lingual multi-media data is made available...

متن کامل

Continuous Analytics: Rethinking Query Processing in a Network-Effect World

Modern data analysis applications driven by the Network Effect are pushing traditional database and data warehousing technologies beyond their limits due to their massively increasing data volumes and demands for low latency. To address this problem, we advocate an integrated query processing approach that runs SQL continuously and incrementally over data before that data is stored in the datab...

متن کامل

Stream-temporal Querying with Ontologies

Recent years have seen theoretical and practical efforts on temporalizing and streamifying ontology-based data access (OBDA). This paper contributes to the practical efforts with a description/evaluation of a prototype implementation for the stream-temporal query language framework STARQL. STARQL serves the needs for industrially motivated scenarios, providing the same interface for querying hi...

متن کامل

Learning Recurrent Event Queries for Web Search

Recurrent event queries (REQ) constitute a special class of search queries occurring at regular, predictable time intervals. The freshness of documents ranked for such queries is generally of critical importance. REQ forms a significant volume, as much as 6% of query traffic received by search engines. In this work, we develop an improved REQ classifier that could provide significant improvemen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010